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ABSTRACT r. ..." _ ; * - ^ , ' \ - 1 

Preliminarydata f rom the High School and^Beyond 
(HSB) research study are 'described inorder to assist bilingual 
education researchers in_ understanding what information is available . 
The HSB pro ject_design j.nciuded a highly stratified national 
probability sample encompassing 30,000 sophomores and 28,000 seniors 
enroiiedin 1^015 public and private high schools* The.study seeks. to 
observe the educational and occupational plans arid activities of high 
school students as they pass through the American educational system. 
The nature of the various data files is described including files on 
students , languages , schools ,_ teachers ' cbitiiliehts> parents -, tests , 
twins , and friends . For example , the most important • f ile , the student 
file, contains responses from each Student to extensive 
quest ionnai res arid various cognitive tests. The language file _ 

contains information distinguishing childhood language status _f rom 

present language status* language usage at home versus language usage 
outside of the home , and information describing experience with 
bilingual education . The constraints of the sample that limit i t s 
generalizability are discussed. I t_is conciudedthat , keeping sample 
constraints in mind , the HSB data provide an extremely valuable 
resource for bilingual education researchers. (RW) 
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ABSTRACT 



Preliminary analysis of the High School .and : Beyon i (HS SB) da j a se t 
MSissic that desDite its sample constraints; - it will be an invaluable 
in re^rces ?or researchers of b i 1 i ngua 1 i sm and b i 1 i ngua 1 education 
The special inclusion of the Hispanic population, the largestjanguage 
minority in the U.S., wilT enable researchers to carry ou de t , ed 
analyses on that population. Iff addition, the van.ousf.es of the 
HS5B data set Includes needed variables to, test the validity of many 
of the heated arguments surrounding bilingual education. 

; 



THE HIGH SCHOOL AND BEYOND DATA SET: ITS RELEVANCE FOR BILINGUAL 
EDUCATION RESEARCH 

Al vin Y. So 

i 

In t roduct ion • 

, i 

Funded by the National Center for Education Statistics (NCES) and 
conducted by the National Opinion Research Center (NORC) , the High ^School 
and * Beyond (HS&B) data set was the first wave of a national longitudinal 
Study of the cohorts of hi gh school students in the United States in 1 98 0. 
The HS&B project des ign incl uded a highly st rat if i ed nat ional probabi 1 i ty 
sample of over 1 1 ,000 high schools with 36 seniors 'and 36 sophomores per 
school . I n those school s with fewer than 36 sen iors or sophomores , all 
elig'ible students were included in the sample. Cooperation from both 

schools and students was excel 1 en t. The overa 1 1 response rate for v .. 

* _. t _____ 

school s was 91% and for students, 8h%. Over 30',000 sophomores and 28,000 
sen iors enrol 1 ed in 1,015 public and pri va te high schools across the 
nation participated in this study. The HS&B sampl e represents the 
nat ion ' s 1 0th and 1 2 grade popul at ions , tota 1 ing about 3-8 mi 1 1 ion 
sophomores and 3 mill ion sen iprs in more^than 21 ,000 school s in spring , 
1980 (Peng et al. , 1 98 1 , p. ix; NORC , ifeSOa): 

As a large-scale, 1 ong i tud i na 1 survey, the primary purpose of the 
HS&B project is to observe the educational and occupational plans and 
activities of young peopl e as they pass through the American educational, 
system and assume their adul t roles (Peng et al., 1 98 1 , p.. ix). Becaus-e 
of its excel 1 en t sample and questionnai rer des Ign , however, the HS&B project 
actual ly has' col lected much mo re data than requi red for its original 
purpose. Page (1^81 , pp. 22-23) describes it as a "priceless national 

resource. ... It is ah ext raord i na r i ly far-sighted project, the 

_> _ _ k : : _ : _• . 

richest resource for research ■ and^ol i cy analysis we- have had." Sub- 
sequently, many well-known researchers such as James Coleman (1981) have 
utilized the HS&B data set to generate publications that have important 
policy implications. Recently, many educat ional journals have devoted 
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their entire issue to a policy report based on this data set (see, for 
example, Harvard Educational Review, 1981 ; Sociology of Education, 1982) : 

Despite i'tPrich data, however, the HS&B data set has still not 
caught the attention of researchers of bilingual education. Except 
for the pioneer study by Nielsen and Fernandez ( 1 98 1 ) on the achieve- 
ments of Hispanic students, no work has been done from the perspective 
of bi 1 Inggal t sm or bilingual education utilizing this rich resource. 

Because of the lack of familiarity of bi 1 ingual education researchers 
with the HSEB data set, we have conducted preliminary analyses of the data 

0 ... 

and are documenting the data set in this* technical note to introduce the 
data set to our fellow research workers in bi 1 ingual education. in what 
follows we shall first describe the nature of various datra files contained 
in the HS&B data set. Then we shall point Out the reason why this data 
set is particularly useful to bilingual education jesearchers , noting 
for the reader the constraint imposed by this data set in carrying out 
bilingual education research. 

The Data Files and the Vari ables 

In order to collect data from as many different resources as 
possible, the HS&B project distributed several sets of questionnaires 
to various individuals. The data collected were then stored in different 
computer files, as presented in Table 1. We shall briefly describe each 
of these files in the following sections. 

The indent file . The most important file in the HS&B data set, 
the student file contains responses from each student in the sample 
to a fairly extensive questionnaire and to various cognitive tests. - r 
Consequently, this file contains responses from a 1.1 the 58,000 students 
in the HS&B sample and includes as many as 638 variables. A summary 
listing of the variables in this file is as follows: 

r \ 

• High School Experience Variables (curriculum placement, course 
taken, grades and homework, vocational training, students' 
Opinion of the school) x 
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• Actjvities Outside of School Variables (working. for pay ; 
organized group activities, other 1 ei sure. act i v i t i es) 

• Val ue s and Attitudes Va riables (1 ife go a 1 s , f ac tors in 
educat ional and occupat ion a] choi ces , hat idhal servi ces) 

• Plans^of High School Seniors Variables (short-range plans, 
long-range plans) 

' f 

• College Plans Vari N abl^s (cri teriafor. choosing a college, 
f i nanc i a 1 aid, expec ted field of study) 

• Achievement T^sts Variables (vocabulary , reading, mathemat ics , 
p i c ture-numbe r , mos i ac compa r i son , v i sua 1 i za t ion in th ree 
dimension^) 

Table 1: A List of the Data Files in the HS & B Data Set 



dumber of Variables 

Name* of the Fi'le Number of Cases in the File in the File 

THe Student File 58,000 students * 638 

The (language File ■ ] 1 ,000 students with nori- 

English .language experience - 

The School File 988 schools 237 

The Teachers 1 Comment File 143*000 teacher observat ions 30 

The Parent File 7,000 parents • 307 

The Test File 53,000 students 2^8 

The Twir> File 500 twins - 6^0 

The Friend File 36^000 one-way friendship not specified 

1 i nkages 



The 1 anguage file . I f a student reported some non-Engl ish language 
experience either during childhood or at the time of the survey - 9 the 
student was requested to complete an additional set of questionnaires^ 
oh language experience. About 11,000 out of a total of 58,000 students 
answered the language questionnaire; their responses were Included in the 
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language file. A summary listing of the ^Variables in this file is 
as f ol lows : f 

• Language Status as a Child 

• Present Home Language .Variables 

• Self-Assessed English and Other Language Proficiency (understanding, 
spoken , read i rig - 9 wr i t i rig ) 

• Present Language Usage (at home, at school, at work, at store) 

• Exper i ence with Bilingual Medium of Instruction in Grades 1-6^ 
7-9, 10-12 

• Courses Taken (in English* as a Second Language , read i ng/wr i t ! ng , 
math/science courses taught in other language, ancestry history) 

Since this file will be of most interest to bilingual education researchers, 
a brief description of the sample characteristics of this file is presented 
in Table 2. This language file contains responses from 5,120 Hispanics, 
3,763 Whites, 663 Asians, 203 American Indians, and 162 Blacks; about 
1,100 students in this file did not answer the question on ei LHer descent 
or mother tongue. Of all the ethnic groups in this language file, the y 
highest percentage (67%) of non-Engl i sh mother tongue students were 
Asians, followed closely by Hispanics (62%), by American Indians (39%), 
and by Whites (20%); Blacks turned but to be lowest percentage (only 8%) 
of non-English mother tongue students. 

Table 2: Sample Characteristics of t>he Language File 



v 

Ethnic Groups 

Mother Tongue Hispanics Whites Asians -J\jtk. Indians Blacks Total 

^English 38% 80% 3^% 62% 92% 55% 

Spanish 61% 2% 2% 3% *♦% 32% 

Other Language 1% 1 8% 65% 36% k% 13% 

Total % 100% 100% 101% 101% 100% 100% 

(N) (5,120) (3,763) (663) (203) (162) (9,S>llJ 
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The school file . The administrator in each select ed school in the 
HSSB sample was requested to complete a quest ibriha i re about the school; 
their responses are included in this school file. This file provides 
information about the social -context fh-which the students receive their 
high school education. Alt In all, 988 school administrators responded 
to questions containing some 237 variables. A summary list of the 
variab les are : 

• School Facilities Variables (l ibrary volumes, indoor lounge, 
departmental office, student cafeteria) 

• School Educa t.iohal Cha racter i st i cs (h ighest/1 owest g rade of f ened , 

v total membership, length of school y-ear, average daily attendance, 

number of graduates) 

• School Ethnic Compos i t i on Variables (percentage of American Indian, 
Asian, Hispanic, Black, White students and faculty) 

• School Social Environment Variables (student- absenteeism, cutting 
classes, parents' lack of interest, teacher absenteeism, robbery, 
drugs, rape, vandal *sm) 

• School Financial_ Si tuation Va riabl es^per-studen t expenditure , 
percentage of funds from tuition, from fund-raising, from religious 
sub sidy, annua 1 tuition, 1 ega 1 ownersh i p) 

• Teacher Characteristics. (percentage female, percentage MA, ; 
average pay, salary steps, teaching experience) 

• tanguage Courses taught (Spanish, German, French, Black Studies 
cultural courses* bilingual program, ESL courses, courses taught 
in mother tongue) 

The teachers^ 1 comments file . Teachers in each sel ected school i n 
the HS&B sample were asked to make comments on students identified in 
the sample. About 1 k ,000 teachers from 611 schools responded on about 
17,000 students. Since a teacher could make comments on one or more 
students, there were a total of about 1 ^3 ,000 teacher observations in 
this file. A partial list of the 30 variables in this file is as 
f ol 1 ows : 

• Classes Taught by Teacher (English, art, history, etc.) 

• Social Background of the Teacher (sex, ethnicity) 

• Teacher's Knowledge of the Student (had student in class, 
know student, know parent) 
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o Evaluation of Student's. Performance _ (student working up to 
potential will probably go to college, seems to dislike 
school ) 

• Comments on Student's Social Traits (seems popular with others, 
emotional handicaps, self-discipline to.hold a job) 

The parent file : About 7,000 parents of the students in the HSSB 
sample were selected to complete another set of ques t i onna i res^ con ta i n i ng 
their views on high school education. A list of the 307 variables in 
this file is as fol lows: 

e Parent's Social Background \/a r i abl es . (sex 2 ethnicity, education, 
occupation, industry, language status, social mobility) 

• Parent's Communication with Students (talk to students in 
grades 6-7, 8-9, 10-11 , 12) 

• Parent's Expectation of Students' Educational and Occupational 
Ach i evemen ts 

«> Parent's Ability to Finance College Education 

• Parent's Actual Involvement in College Planning (talking to 
counselors, reading pamphlets, talking to other parents) 

Finally, there are the test f i 1e , the friend file , and the twin 
fili which include a battery of cognitive tests, friendship linkages 
and information on twins, respectively. Since these three files may 
be of less interest to bilingual education researchers, we shall not 
review them here. Interested readers can consult the codebooks or 
news releases for further details (NCES, 1982a, 1982c; NORC, 1980a, 
1980b, 1980c). 

Jhe Relevance of the HSSB Data Set to B i 1 ingual F d ne a t I on Research 

The HS&B data* set is particularly useful to researchers in 
bi 1 ihgual ism and bilingual education because of its excellent language 
file. According to Nielsen and Fernandez (1981, p. 3), the language 
file contains a language questionnaire chat is even superior in quality 
to that in the 1976 Survey of Income and Education national data set. 
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First* the language file distinguishes childhood language status 
from the present language status, thus permitting resea rche rs ' £o study 
the rate of language shift in the present generation of high school 
s tuden ts . Second 1 y , the I angaage file also distinguishes 1 artouage usage 
at home from Usage outside the home, and distinguishes oral proffciency 
(speaking, listening) from literacy (reading, writing). These finer 
distinctions enable researchers to study in more detail the actual 
patterns of language shift in these four important language domains, 
third, the language file includes information on experience with a 
b i 1 i ngua 1 medium of education and on types of language courses taken in 
schools. This kind of language information allows researchers to classify 
types of bilingual education programs arid to investigate their differentia 
impacts on language shifts. 

in addition, when the language file is merged with other files in the 
HS5B data set, the newly merged file prov'^s important data that can open 
up new frontiers in bilingual education research. For instance, the 
merged 1 anguage-s tuden t file will allow researchers to study the social 
background of language minority students, their experience in the U.S. 
high schools, and* their educational achievement in comparison with 
non- language-mi nor i ty youths. 

Another example is the merged \ anguage-school jij le,* which will 
enable researchers to study language minority youths from a holistic 
perspective. The new language-school file will tell us, for example, 
which type of schools do most language minority students attend, what 
is the ethnic compos i t i or, of the students and the social environment 
in those schools, arid what kinds of language courses are offered. 

In addition to the rich number of variabj.es it contains, the HSSB 
da ta set is a I so va I uab I e to b i I i ngua 1 educ^ t i on in tha t it i ncl udes - 
informat ion bri the 1 a rges t language mi nor i ty n the U.S., i.e., H i span i i cs . 
Rarely ha^ a national survey on high school education paid sufficient 
attention to the issues facing the Hispanic language mi nor i ty . Thus, 
the HS&B data set may be the first national project that aims to include 
adequate Hispanic respondents in its sample. 
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To achieve this aim, the HS&B project designed a special strata 
of 136 Hispanic schools. Further, out of its 58,000 student sample, 
the HS&B data set included about 6,700 Hispanic students. Since over- 
sampling would affect the representativeness of the HS&B sample, 
wieghts were assigned to each student in the sample in order to balance 
out the over-sampling effect: Consequently, weights were calculated to 
reflect differential probabilities of sample selection and to adjust 
for ribri response. In this respect, the HS&B data set remains a nationally 
representative study in which its sample characteristics can be used to 
deduce the U.S. student population. 

Const rainis on the Sample which Limit General i zab i 1 i ty 

This section of the technical note points ; -to" the sample constraints 
of the HS&B dat a set for conducting bilingual education research. First 
of all, 8,278 students, or about 12% of the originally targeted 69,662 
student sample, were absent on the day the HS&B survey was conducted 
(N0RC, 1980a, fable lj. Since this represents quite a large number of 
students, it cannot be assumed that all the absentees were sick or were 
absent for family reasons. It is highly conceivable that many of these 
absentee students were from language minority backgrounds, but there i 7 s 
no way to really estimate the number of absentee language* minority 
students.- If this assumption is correct, then the HSSB data set has 
already discarded quite a large number of language minority background 
students from the sample. 

The second sample constraint, which follows the logic of the above 
argument but has more serious consequences, is the high drop out rate 
for language minority students. Chan (in press) points out that 
the drop out rate. for limited-English or non-Engl ish speaking children 
is about three to four times the rate for English-speaking students. 
Similarly, Waggoner (1981, p. k\) reveals that language minority 
students are less than half as likely as- people with English language 
backgrounds to have completed high school .or to have attended college. 
Similarly, Nielsen and Fernandez (1981, p. 1*0 also suggest that among 
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Hispanic dropouts, 66% had left school before grade 10: These stadies 
point to the fact that many language minority students with low reading 
achievement have dropped out of school before grade 10. Consequently, 
the HS5B data set at best includes only those students who are talented 
or determined enough to survive through high school beyond the 10th 
grade. 

J " 
Due to the above filtering processes of absenteeism and dropping 

but, the third sample constraint which necessarily follows is that there 

is the conspicuous absence of non-English speaking language minority 

students in the HSsB data set. A simple fact is that if a student really 

is non-Englistf speaking, that student could not make it to grade 16 and 

show up on the HS&B survey day. Consequently, when a student is a^ked 

for his/her self-assessed English ability on the HSSB questionnaire, 

almost no one in the sample replied that he/she did not Understand 

English. indeed, one has to understand what is written on the English 

HS6B questionnaire at least well enough in order to circle the right 

answer "no English ability at all. 11 Consequently, only 56 out of ' 

58,000 students answered the questionnaire in Spanish. And of these 56 

students, only 11 showed up in the language file, a fact that continues 

to puzzle us. 

it is hard to assess what impacts these sample constraints might have 
Oh bilingual education research. We can speculate that results from analyses 
of the HS&B data set might tend to overestimate language shift towards 
English monol ingual ism, and to Underestimate the educational disadvantages 
facing language minority students because of a large number of students 
who were either absent from school on the day of the survey or who were 
drop outs. Keeping these sample constraints in mind, however, the HS6B 
data set will prove to be an extremely valuable resource for researchers 
in bilingual education. 
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